Mako hSDM BRT explore (CRW PAs)

Author

Emily Nazario

Published

July 15, 2024

On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps.

The majority of the predictors included in the following models are at a daily temporal resolution. However, for the DO and AGI models, we also investigated the inclusion of these two predictors at seasonal and annual temporal resolutions. The remaining environmental predictors are also available at these resolutions, and can included in follow-up models.

The pseudo absences used in these models were generated using correlated random walk approaches, but another quarto document includes models with background sampling pseudo absences. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.

The hypotheses I would like to test with these models are as follows:

H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.

study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?

H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.

study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?

H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.

study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?

H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.

study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?

H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.

study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?

Base models

These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.

base_Nspat_Ntag <- readRDS(brt_outputs[7])

# model performance 
ggBRT::ggPerformance(base_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.5736928
Correlation        0.8285057
AUC                0.9664000
Per.Expl          58.6164539
cvDeviance         0.8622091
cvCorrelation      0.6620174
cvAUC              0.8766500
cvPer.Expl        37.8042183
#relative influence of predictors
ggBRT::ggInfluence(base_Nspat_Ntag) 

             rel.inf
bathy_mean 24.425351
temp_mean  18.087655
sal_mean   10.919911
chl_mean    9.336627
vostr_mean  6.686538
ssh_mean    6.168030
mld_mean    6.048063
bathy_sd    5.587410
uo_mean     3.417308
vo_mean     3.364026
uostr_mean  3.297190
pred_var    2.661890
#explore partial plots
gbm.plot(base_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Nspat_Ntag)
base_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   952.17
2          3   sal_mean          2  temp_mean   710.45
3          8   ssh_mean          2  temp_mean   512.92
4          6    vo_mean          3   sal_mean   417.59
5         10 bathy_mean          3   sal_mean   368.97
6          2  temp_mean          1   chl_mean   365.72
7         10 bathy_mean          8   ssh_mean   278.76
#predictive performance using test dataset 
preds <- predict.gbm(base_Nspat_Ntag, dat_test_base,
                     n.trees = base_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.9610641
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6752828
max(e@TPR + e@TNR -1) #TSS
[1] 0.5568553
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Nspat_Ntag)
[1] 58.61645
#eval 75/25
eval_7525_modified(base_Nspat_Ntag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6700 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3977416 0.6213903 0.8517752  1.136004         0.3067274 0.5861645
base_Nspat_Ytag <- readRDS(brt_outputs[8])

# model performance 
ggBRT::ggPerformance(base_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.2305879
Correlation        0.9503452
AUC                0.9979000
Per.Expl          83.3664540
cvDeviance         0.4538959
cvCorrelation      0.8528183
cvAUC              0.9680300
cvPer.Expl        67.2580483
#relative influence of predictors
ggBRT::ggInfluence(base_Nspat_Ytag) 

              rel.inf
tag        44.1894066
bathy_mean 17.5965698
temp_mean  11.0672136
sal_mean    6.8107330
chl_mean    4.6665128
ssh_mean    3.6780850
vostr_mean  3.5179667
mld_mean    2.5624157
bathy_sd    1.8533773
uostr_mean  1.2374256
vo_mean     1.1506546
uo_mean     1.0981552
pred_var    0.5714842
#explore partial plots
gbm.plot(base_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Nspat_Ytag)
base_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1          4   sal_mean          1        tag  1496.60
2          3  temp_mean          1        tag  1036.49
3         11 bathy_mean          1        tag   988.07
4          2   chl_mean          1        tag   512.43
5          9   ssh_mean          1        tag   369.40
6         11 bathy_mean          3  temp_mean   274.54
7          8 vostr_mean          1        tag   231.62
8         10   mld_mean          1        tag   207.08
#predictive performance using test dataset 
preds <- predict.gbm(base_Nspat_Ytag, dat_test_base,
                     n.trees = base_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.8926455
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6986152
max(e@TPR + e@TNR -1) #TSS
[1] 0.6791528
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Nspat_Ytag)
[1] 83.36645
#eval 75/25
eval_7525_modified(base_Nspat_Ytag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3642804 0.7198741 0.8981109  1.210241         0.3560819 0.8336645
base_Yspat_Ytag <- readRDS(brt_outputs[9])

# model performance 
ggBRT::ggPerformance(base_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.1773819
Correlation        0.9648516
AUC                0.9992000
Per.Expl          87.2044910
cvDeviance         0.3814392
cvCorrelation      0.8803459
cvAUC              0.9770700
cvPer.Expl        72.4847405
#relative influence of predictors
ggBRT::ggInfluence(base_Yspat_Ytag) 

              rel.inf
tag        40.2206057
dist_coast 24.0608862
lat         9.4741336
temp_mean   4.9067106
bathy_mean  4.6525961
sal_mean    4.4004394
chl_mean    3.5004953
vostr_mean  2.4769759
ssh_mean    1.7393792
mld_mean    1.7041726
bathy_sd    0.8436843
uo_mean     0.5521552
vo_mean     0.5356691
uostr_mean  0.5120463
pred_var    0.4200505
#explore partial plots
gbm.plot(base_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Yspat_Ytag)
base_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   883.10
2           5   sal_mean          1        tag   512.27
3           4  temp_mean          1        tag   484.21
4           3   chl_mean          1        tag   478.25
5          14 dist_coast          1        tag   452.76
6          12 bathy_mean          1        tag   425.15
7          11   mld_mean          1        tag   201.34
8          14 dist_coast          2        lat   160.15
9          14 dist_coast          4  temp_mean   153.48
10         10   ssh_mean          1        tag   149.67
11         15   pred_var          1        tag   139.75
#predictive performance using test dataset 
preds <- predict.gbm(base_Yspat_Ytag, dat_test_base,
                     n.trees = base_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.8947524
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7021335
max(e@TPR + e@TNR -1) #TSS
[1] 0.6867588
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Yspat_Ytag)
[1] 87.20449
#eval 75/25
eval_7525_modified(base_Yspat_Ytag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3604686 0.7321882 0.9049214  1.219883          0.354562 0.8720449

DO models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).

do_0m_Nspat_Ytag <- readRDS(brt_outputs[14])

# model performance 
ggBRT::ggPerformance(do_0m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3408774
Correlation        0.9058550
AUC                0.9897000
Per.Expl          75.4108818
cvDeviance         0.5312851
cvCorrelation      0.8146489
cvAUC              0.9559100
cvPer.Expl        61.6758566
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Nspat_Ytag) 

              rel.inf
tag        40.9080062
bathy_mean 23.7019091
o2_mean_0m 14.1885697
temp_mean   4.2249870
chl_mean    3.4047577
sal_mean    2.9679120
ssh_mean    2.7533054
vostr_mean  1.6135297
mld_mean    1.4718496
bathy_sd    1.0656363
pred_var    1.0330993
uostr_mean  0.9371363
vo_mean     0.9056197
uo_mean     0.8236821
#explore partial plots
gbm.plot(do_0m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Nspat_Ytag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12 bathy_mean          1        tag  1100.34
2           2 o2_mean_0m          1        tag   577.24
3           5   sal_mean          1        tag   519.34
4           4  temp_mean          1        tag   249.16
5          10   ssh_mean          1        tag   212.28
6          12 bathy_mean          4  temp_mean   154.44
7           3   chl_mean          1        tag   153.94
8          13   bathy_sd          1        tag   126.85
9          11   mld_mean          1        tag   117.52
10         14   pred_var          1        tag   104.95
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9763715
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6853382
max(e@TPR + e@TNR -1) #TSS
[1] 0.6121474
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Nspat_Ytag)
[1] 75.41088
#eval 75/25
eval_7525_modified(do_0m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3833426 0.6762378 0.8716359  1.194065         0.2956827 0.7541088
do_0m_Yspat_Ytag <- readRDS(brt_outputs[15])

# model performance 
ggBRT::ggPerformance(do_0m_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3169398
Correlation        0.9131868
AUC                0.9912000
Per.Expl          77.1376144
cvDeviance         0.4910827
cvCorrelation      0.8306125
cvAUC              0.9627200
cvPer.Expl        64.5758521
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Yspat_Ytag) 

              rel.inf
tag        39.3243042
dist_coast 23.4675901
o2_mean_0m  9.2630160
bathy_mean  7.6602469
lat         7.4436658
temp_mean   2.1578941
chl_mean    2.1270429
sal_mean    2.0922701
ssh_mean    1.4504847
mld_mean    1.0484217
pred_var    0.9808561
vostr_mean  0.7765505
bathy_sd    0.6166353
uostr_mean  0.5730787
vo_mean     0.5270670
uo_mean     0.4908760
#explore partial plots
gbm.plot(do_0m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Yspat_Ytag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   537.19
2          13 bathy_mean          1        tag   441.38
3           3 o2_mean_0m          1        tag   378.08
4          15 dist_coast          1        tag   190.11
5           6   sal_mean          1        tag   188.58
6           4   chl_mean          1        tag   144.71
7          11   ssh_mean          1        tag   112.11
8           5  temp_mean          1        tag    96.88
9          16   pred_var          1        tag    90.22
10         14   bathy_sd          1        tag    83.50
11          3 o2_mean_0m          2        lat    75.98
12         12   mld_mean          1        tag    69.54
13          5  temp_mean          2        lat    62.79
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Yspat_Ytag, dat_test_do,
                     n.trees = do_0m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9273483
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6914743
max(e@TPR + e@TNR -1) #TSS
[1] 0.6400517
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Yspat_Ytag)
[1] 77.13761
#eval 75/25
eval_7525_modified(do_0m_Yspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3722069 0.697656 0.8838555  1.186513         0.3310462 0.7713761
do_0m_60m_Nspat_Ytag <- readRDS(brt_outputs[13])

# model performance 
ggBRT::ggPerformance(do_0m_60m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3275756
Correlation        0.9101734
AUC                0.9904000
Per.Expl          76.3704047
cvDeviance         0.5211133
cvCorrelation      0.8185662
cvAUC              0.9575900
cvPer.Expl        62.4096016
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_Nspat_Ytag) 

               rel.inf
tag         39.7630992
bathy_mean  22.3644729
o2_mean_0m  13.1609254
o2_mean_60m  5.7538384
chl_mean     3.4719050
temp_mean    3.0814247
sal_mean     2.8796698
ssh_mean     2.4945921
mld_mean     1.4530477
vostr_mean   1.2564378
pred_var     1.0243164
bathy_sd     0.9583521
vo_mean      0.8437129
uostr_mean   0.8075706
uo_mean      0.6866347
#explore partial plots
gbm.plot(do_0m_60m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_Nspat_Ytag)
do_int$rank.list
   var1.index  var1.names var2.index var2.names int.size
1          12  bathy_mean          1        tag  1007.87
2           2  o2_mean_0m          1        tag   483.04
3           5    sal_mean          1        tag   376.66
4          14 o2_mean_60m          1        tag   244.01
5           4   temp_mean          1        tag   223.33
6          10    ssh_mean          1        tag   169.05
7           3    chl_mean          1        tag   158.22
8          13    bathy_sd          1        tag   133.89
9          12  bathy_mean          4  temp_mean    98.89
10         11    mld_mean          1        tag    89.75
11          7  uostr_mean          1        tag    85.56
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9728905
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.686591
max(e@TPR + e@TNR -1) #TSS
[1] 0.6135252
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_Nspat_Ytag)
[1] 76.3704
#eval 75/25
eval_7525_modified(do_0m_60m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained PseudoR2
1 0.3825586 0.6793962 0.874085  1.197922         0.2981937 0.763704
do_0m_250m_Nspat_Ytag <- readRDS(brt_outputs[10])

# model performance 
ggBRT::ggPerformance(do_0m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3181841
Correlation        0.9124632
AUC                0.9911000
Per.Expl          77.0478579
cvDeviance         0.5109006
cvCorrelation      0.8216070
cvAUC              0.9592600
cvPer.Expl        63.1462878
#relative influence of predictors
ggBRT::ggInfluence(do_0m_250m_Nspat_Ytag) 

                rel.inf
tag          39.6887396
o2_mean_0m   16.8057119
o2_mean_250m 16.3627195
bathy_mean   11.6711360
temp_mean     2.7622574
chl_mean      2.2458410
sal_mean      2.1797729
ssh_mean      1.9088688
pred_var      1.1660577
mld_mean      1.0505143
bathy_sd      0.9624893
vostr_mean    0.9540213
uostr_mean    0.7842279
uo_mean       0.7318377
vo_mean       0.7258045
#explore partial plots
gbm.plot(do_0m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_250m_Nspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          12   bathy_mean          1        tag   624.45
2           2   o2_mean_0m          1        tag   515.76
3          14 o2_mean_250m          1        tag   451.56
4           5     sal_mean          1        tag   437.13
5          14 o2_mean_250m          2 o2_mean_0m   206.71
6           3     chl_mean          1        tag   190.92
7          10     ssh_mean          1        tag   169.35
8           4    temp_mean          1        tag   130.26
9          13     bathy_sd          1        tag   108.91
10          5     sal_mean          2 o2_mean_0m   103.02
11         15     pred_var          1        tag    95.95
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_250m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9809756
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6865113
max(e@TPR + e@TNR -1) #TSS
[1] 0.6195304
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_250m_Nspat_Ytag)
[1] 77.04786
#eval 75/25
eval_7525_modified(do_0m_250m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6250 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3824959 0.6804338 0.8739071  1.200324         0.2923615 0.7704786
do_0m_60m_250m_Nspat_Ytag <- readRDS(brt_outputs[11])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3191965
Correlation        0.9122045
AUC                0.9909000
Per.Expl          76.9748229
cvDeviance         0.5103819
cvCorrelation      0.8220382
cvAUC              0.9594100
cvPer.Expl        63.1837098
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Nspat_Ytag) 

                rel.inf
tag          38.7855718
o2_mean_0m   16.7326988
o2_mean_250m 15.8676740
bathy_mean   11.1462711
o2_mean_60m   3.6391761
temp_mean     2.3942027
sal_mean      2.1301516
chl_mean      2.0440835
ssh_mean      1.6371297
pred_var      1.0443052
mld_mean      0.9945159
vostr_mean    0.8817324
bathy_sd      0.8374595
uostr_mean    0.7158327
vo_mean       0.6096311
uo_mean       0.5395639
#explore partial plots
gbm.plot(do_0m_60m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Nspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          12   bathy_mean          1        tag   569.59
2          15 o2_mean_250m          1        tag   463.02
3           2   o2_mean_0m          1        tag   458.51
4           5     sal_mean          1        tag   328.66
5           4    temp_mean          1        tag   178.30
6           3     chl_mean          1        tag   175.45
7          14  o2_mean_60m          1        tag   168.87
8          10     ssh_mean          1        tag   147.64
9          15 o2_mean_250m          2 o2_mean_0m   141.39
10         13     bathy_sd          1        tag   101.16
11         16     pred_var          1        tag    73.16
12         11     mld_mean          1        tag    72.52
13          7   uostr_mean          1        tag    68.16
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9704697
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6874818
max(e@TPR + e@TNR -1) #TSS
[1] 0.6199684
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Nspat_Ytag)
[1] 76.97482
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3812725 0.6825025 0.8758667  1.198771           0.29994 0.7697482
do_0m_60m_250m_Yspat_Ytag <- readRDS(brt_outputs[12])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3011707
Correlation        0.9188137
AUC                0.9925000
Per.Expl          78.2751108
cvDeviance         0.4820971
cvCorrelation      0.8345785
cvAUC              0.9641300
cvPer.Expl        65.2240256
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Yspat_Ytag) 

                rel.inf
tag          37.5967831
dist_coast   20.6808825
o2_mean_0m   10.5992826
o2_mean_250m  8.5196739
lat           4.6145739
bathy_mean    3.8293017
o2_mean_60m   2.9614725
temp_mean     1.9990685
chl_mean      1.8128857
sal_mean      1.6521470
ssh_mean      1.2297337
pred_var      1.0165713
mld_mean      0.8872307
bathy_sd      0.6138532
uostr_mean    0.5819161
vostr_mean    0.4989877
vo_mean       0.4618753
uo_mean       0.4437606
#explore partial plots
gbm.plot(do_0m_60m_250m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Yspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1           2          lat          1        tag   367.17
2          13   bathy_mean          1        tag   301.62
3           3   o2_mean_0m          1        tag   273.23
4          17 o2_mean_250m          1        tag   253.83
5           6     sal_mean          1        tag   176.41
6           4     chl_mean          1        tag   142.58
7          15   dist_coast          1        tag   137.30
8          11     ssh_mean          1        tag   106.92
9          16  o2_mean_60m          1        tag   100.63
10          5    temp_mean          1        tag    76.80
11         18     pred_var          1        tag    74.23
12         14     bathy_sd          1        tag    69.96
13         12     mld_mean          1        tag    67.22
14          9      vo_mean          1        tag    54.84
15         10   vostr_mean          1        tag    49.70
16         17 o2_mean_250m          3 o2_mean_0m    40.74
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Yspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_250m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9470103
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.69071
max(e@TPR + e@TNR -1) #TSS
[1] 0.6351745
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Yspat_Ytag)
[1] 78.27511
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Yspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3747511 0.6952711 0.8822997  1.196225         0.3168627 0.7827511

AGI models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).

agi_0m_Nspat_Ytag <- readRDS(brt_outputs[5])

# model performance 
ggBRT::ggPerformance(agi_0m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3294417
Correlation        0.9120135
AUC                0.9914000
Per.Expl          76.2356104
cvDeviance         0.5331603
cvCorrelation      0.8132824
cvAUC              0.9557600
cvPer.Expl        61.5402963
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Nspat_Ytag) 

              rel.inf
tag        41.4342777
bathy_mean 22.5768731
temp_mean   8.8146014
AGI_0m      5.3409758
sal_mean    4.3539170
ssh_mean    4.0682555
chl_mean    3.4388349
vostr_mean  2.4504783
uostr_mean  1.5708916
mld_mean    1.3780963
bathy_sd    1.3161888
vo_mean     1.2188877
pred_var    1.0875914
uo_mean     0.9501306
#explore partial plots
gbm.plot(agi_0m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          11 bathy_mean          1        tag   803.70
2           4   sal_mean          1        tag   650.52
3          13     AGI_0m          3  temp_mean   565.85
4           3  temp_mean          1        tag   323.97
5           9   ssh_mean          1        tag   312.45
6          11 bathy_mean          3  temp_mean   257.21
7          13     AGI_0m          1        tag   254.47
8          12   bathy_sd          1        tag   207.39
9           2   chl_mean          1        tag   203.19
10         10   mld_mean          1        tag   153.81
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 1.000126
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6831368
max(e@TPR + e@TNR -1) #TSS
[1] 0.6098252
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Nspat_Ytag)
[1] 76.23561
#eval 75/25
eval_7525_modified(agi_0m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7250 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3894603 0.6689158 0.8672566  1.217187         0.2785315 0.7623561
agi_0m_Yspat_Ytag <- readRDS(brt_outputs[6])

# model performance 
ggBRT::ggPerformance(agi_0m_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3079406
Correlation        0.9177214
AUC                0.9925000
Per.Expl          77.7865966
cvDeviance         0.4916562
cvCorrelation      0.8299703
cvAUC              0.9625900
cvPer.Expl        64.5342064
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Yspat_Ytag) 

              rel.inf
tag        39.1704772
dist_coast 22.9067106
lat         9.2973311
bathy_mean  7.5281540
temp_mean   4.6964781
AGI_0m      4.1053374
sal_mean    2.6165565
chl_mean    2.4600045
ssh_mean    1.8019982
pred_var    1.1308360
mld_mean    0.9622477
vostr_mean  0.7571586
bathy_sd    0.7305617
uostr_mean  0.6250570
uo_mean     0.6156725
vo_mean     0.5954189
#explore partial plots
gbm.plot(agi_0m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Yspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   646.34
2          12 bathy_mean          1        tag   375.27
3           5   sal_mean          1        tag   260.79
4           3   chl_mean          1        tag   187.34
5           4  temp_mean          1        tag   176.90
6          15 dist_coast          1        tag   162.12
7          14     AGI_0m          1        tag   154.37
8          13   bathy_sd          1        tag   134.66
9          16   pred_var          1        tag   101.32
10         12 bathy_mean          4  temp_mean    99.58
11         10   ssh_mean          1        tag    92.42
12         11   mld_mean          1        tag    67.39
13          9 vostr_mean          1        tag    61.50
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Yspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9527297
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6896255
max(e@TPR + e@TNR -1) #TSS
[1] 0.6394782
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Yspat_Ytag)
[1] 77.7866
#eval 75/25
eval_7525_modified(agi_0m_Yspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6350 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained PseudoR2
1 0.3785647 0.6906597 0.880208  1.211341         0.3127222 0.777866
agi_0m_60m_Nspat_Ytag <- readRDS(brt_outputs[4])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3298267
Correlation        0.9115959
AUC                0.9912000
Per.Expl          76.2078336
cvDeviance         0.5248204
cvCorrelation      0.8175566
cvAUC              0.9574200
cvPer.Expl        62.1418985
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_Nspat_Ytag) 

              rel.inf
tag        41.1130461
bathy_mean 22.1175206
temp_mean   8.8167602
AGI_0m      5.1923827
sal_mean    3.8146119
AGI_60m     3.5914584
ssh_mean    3.3646809
chl_mean    2.8805052
vostr_mean  2.2362877
mld_mean    1.3300028
uostr_mean  1.2641285
vo_mean     1.2078786
bathy_sd    1.1634177
pred_var    1.0421567
uo_mean     0.8651621
#explore partial plots
gbm.plot(agi_0m_60m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          11 bathy_mean          1        tag   680.43
2          13     AGI_0m          3  temp_mean   677.26
3           4   sal_mean          1        tag   537.76
4           3  temp_mean          1        tag   412.50
5          14    AGI_60m          1        tag   247.42
6          13     AGI_0m          1        tag   229.24
7           9   ssh_mean          1        tag   222.42
8          11 bathy_mean          3  temp_mean   187.87
9          12   bathy_sd          1        tag   151.81
10         10   mld_mean          1        tag   151.59
11          2   chl_mean          1        tag   146.54
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9763022
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6855269
max(e@TPR + e@TNR -1) #TSS
[1] 0.6165029
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_Nspat_Ytag)
[1] 76.20783
#eval 75/25
eval_7525_modified(agi_0m_60m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6700 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3859565 0.6748523 0.8716447  1.212224         0.2957176 0.7620783
agi_0m_250m_Nspat_Ytag <- readRDS(brt_outputs[1])

# model performance 
ggBRT::ggPerformance(agi_0m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3250560
Correlation        0.9119144
AUC                0.9912000
Per.Expl          76.5519740
cvDeviance         0.5193933
cvCorrelation      0.8180693
cvAUC              0.9578900
cvPer.Expl        62.5333843
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_250m_Nspat_Ytag) 

              rel.inf
tag        40.4877590
AGI_250m   16.0481464
bathy_mean 11.8550705
temp_mean  11.3274832
AGI_0m      4.0294752
sal_mean    3.1191434
ssh_mean    2.8171750
chl_mean    2.4561101
vostr_mean  1.7373119
bathy_sd    1.1323632
mld_mean    1.0927830
uostr_mean  1.0841796
pred_var    1.0292176
vo_mean     1.0064972
uo_mean     0.7772848
#explore partial plots
gbm.plot(agi_0m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_250m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           4   sal_mean          1        tag   628.92
2          11 bathy_mean          1        tag   513.82
3          14   AGI_250m          1        tag   395.22
4           3  temp_mean          1        tag   374.74
5           2   chl_mean          1        tag   242.62
6           9   ssh_mean          1        tag   218.77
7          13     AGI_0m          3  temp_mean   203.96
8          13     AGI_0m          1        tag   202.13
9          14   AGI_250m          3  temp_mean   136.10
10         12   bathy_sd          1        tag   120.44
11         15   pred_var          1        tag   112.61
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_250m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9969886
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6839568
max(e@TPR + e@TNR -1) #TSS
[1] 0.6028544
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_250m_Nspat_Ytag)
[1] 76.55197
#eval 75/25
eval_7525_modified(agi_0m_250m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6450 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3901855 0.6688813 0.8688766  1.221388         0.2807949 0.7655197
agi_0m_60m_250m_Nspat_Ytag <- readRDS(brt_outputs[2])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3194562
Correlation        0.9142194
AUC                0.9917000
Per.Expl          76.9559147
cvDeviance         0.5170729
cvCorrelation      0.8191713
cvAUC              0.9583200
cvPer.Expl        62.7007660
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Nspat_Ytag) 

              rel.inf
tag        39.9057995
AGI_250m   15.9521143
bathy_mean 11.5951719
temp_mean  11.0079998
AGI_0m      3.9165535
sal_mean    2.9778595
AGI_60m     2.5421021
ssh_mean    2.3136229
chl_mean    2.1326512
vostr_mean  1.7126189
uostr_mean  1.2278622
bathy_sd    1.0302718
mld_mean    1.0176906
pred_var    1.0118376
vo_mean     0.9572175
uo_mean     0.6986267
#explore partial plots
gbm.plot(agi_0m_60m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           4   sal_mean          1        tag   598.01
2          11 bathy_mean          1        tag   474.54
3          15   AGI_250m          1        tag   312.33
4           3  temp_mean          1        tag   311.72
5           2   chl_mean          1        tag   242.05
6          14    AGI_60m          1        tag   224.33
7           9   ssh_mean          1        tag   200.59
8          13     AGI_0m          1        tag   191.78
9          13     AGI_0m          3  temp_mean   179.03
10         12   bathy_sd          1        tag   122.19
11         15   AGI_250m          3  temp_mean   119.14
12          8 vostr_mean          1        tag    81.82
13         16   pred_var          1        tag    80.72
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9850231
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6852647
max(e@TPR + e@TNR -1) #TSS
[1] 0.6117762
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Nspat_Ytag)
[1] 76.95591
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6400 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3876588 0.6736314 0.8714891   1.21862         0.2894265 0.7695591
agi_0m_60m_250m_Yspat_Ytag <- readRDS(brt_outputs[3])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.2996914
Correlation        0.9203897
AUC                0.9931000
Per.Expl          78.3816567
cvDeviance         0.4880000
cvCorrelation      0.8312630
cvAUC              0.9631300
cvPer.Expl        64.7979495
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Yspat_Ytag) 

              rel.inf
tag        38.8772398
dist_coast 19.6961943
lat         9.2401730
AGI_250m    7.9838041
bathy_mean  4.2759852
temp_mean   4.1810197
AGI_0m      3.2807915
AGI_60m     2.2096760
sal_mean    2.1899170
chl_mean    2.0352351
ssh_mean    1.2407309
pred_var    1.0135341
mld_mean    0.9384425
bathy_sd    0.6260702
vo_mean     0.5836817
uostr_mean  0.5717539
vostr_mean  0.5376097
uo_mean     0.5181415
#explore partial plots
gbm.plot(agi_0m_60m_250m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Yspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   554.02
2          17   AGI_250m          1        tag   272.42
3          12 bathy_mean          1        tag   268.95
4           5   sal_mean          1        tag   205.22
5           3   chl_mean          1        tag   175.43
6          16    AGI_60m          1        tag   134.83
7           4  temp_mean          1        tag   134.10
8          15 dist_coast          1        tag   126.25
9          14     AGI_0m          1        tag   118.51
10         18   pred_var          1        tag    91.28
11         13   bathy_sd          1        tag    73.70
12         10   ssh_mean          1        tag    62.40
13         11   mld_mean          1        tag    54.65
14          8    vo_mean          1        tag    53.91
15         17   AGI_250m          2        lat    44.17
16          9 vostr_mean          1        tag    40.01
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Yspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9620935
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6893808
max(e@TPR + e@TNR -1) #TSS
[1] 0.6313319
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Yspat_Ytag)
[1] 78.38166
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Yspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6250 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE     Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3811484 0.68772 0.8796713  1.218843         0.3059674 0.7838166

Summary table of results

output_sum <- read.csv(here("data/brt/mod_outputs/brt_crw_output_summary.csv"))
kableExtra::kable(output_sum)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE SpearmanCor C.index PredRatio PseudoR2
base_0m_Nspat_Ntag 58.62 0.634 0.723 0.7550 0.948 0.3096 0.793 0.948 0.9923 0.586
base_0m_Nspat_Ytag 83.37 0.286 0.745 0.9199 0.992 0.1920 0.926 0.992 0.9940 0.834
base_0m_Yspat_Ytag 87.20 0.232 0.747 0.9360 0.995 0.1710 0.942 0.995 0.9960 0.872
do_0m_Nspat_Ytag 75.41 0.393 0.740 0.8480 0.981 0.2400 0.881 0.981 0.9980 0.754
do_0m_Yspat_Ytag 77.14 0.366 0.741 0.8580 0.984 0.2300 0.890 0.984 0.9970 0.771
do_0m_60m_Nspat_Ytag 76.37 0.382 0.749 0.8550 0.982 0.2360 0.885 0.982 0.9990 0.764
do_0m_250m_Nspat_Ytag 77.05 0.374 0.741 0.8570 0.983 0.2340 0.886 0.983 0.9980 0.770
do_0m_60m_250m_Nspat_Ytag 76.97 0.376 0.741 0.8530 0.882 0.2350 0.885 0.982 0.9990 0.770
do_0m_60m_250m_Yspat_Ytag 78.28 0.356 0.742 0.8650 0.985 0.2270 0.893 0.985 0.9980 0.783
agi_0m_Nspat_Ytag 76.24 0.377 0.741 0.8660 0.984 0.2310 0.890 0.984 1.0000 0.762
agi_0m_Yspat_Ytag 77.78 0.351 0.743 0.8780 0.986 0.2230 0.898 0.986 1.0020 0.778
agi_0m_60m_Nspat_Ytag 76.21 0.377 0.741 0.8620 0.984 0.2310 0.890 0.984 1.0000 0.762
agi_0m_250m_Nspat_Ytag 76.55 0.371 0.742 0.8650 0.984 0.2300 0.891 0.984 1.0000 0.765
agi_0m_60m_250m_Nspat_Ytag 76.96 0.366 0.742 0.8650 0.985 0.2280 0.893 0.985 1.0000 0.770
agi_0m_60m_250m_Yspat_Ytag 78.38 0.345 0.743 0.8790 0.987 0.2210 0.900 0.987 1.0000 0.784

DO models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

do_0m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[12])

# model performance 
ggBRT::ggPerformance(do_0m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.6208259
Correlation        0.7971732
AUC                0.9514000
Per.Expl          55.2164882
cvDeviance         0.8375316
cvCorrelation      0.6744345
cvAUC              0.8808600
cvPer.Expl        39.5843427
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Nspat_Ntag) 

             rel.inf
bathy_mean 31.526581
o2_mean_0m 23.264544
temp_mean   9.741202
chl_mean    6.921603
sal_mean    5.468262
ssh_mean    4.104042
mld_mean    3.594511
vostr_mean  3.464124
bathy_sd    3.007315
vo_mean     2.617281
uo_mean     2.402761
uostr_mean  2.084738
pred_var    1.803036
#explore partial plots
gbm.plot(do_0m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Nspat_Ntag)
do_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1         11 bathy_mean          3  temp_mean   595.51
2         11 bathy_mean          9   ssh_mean   309.62
3         11 bathy_mean          1 o2_mean_0m   258.95
4          6 uostr_mean          1 o2_mean_0m   238.00
5          7    vo_mean          4   sal_mean   217.39
6          9   ssh_mean          2   chl_mean   164.66
7          3  temp_mean          1 o2_mean_0m   154.42
8          9   ssh_mean          3  temp_mean   134.22
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9544526
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6724849
max(e@TPR + e@TNR -1) #TSS
[1] 0.5489338
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Nspat_Ntag)
[1] 55.21649
#eval 75/25
eval_7525_modified(do_0m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3963781 0.6209157 0.8461656   1.11377         0.3114942 0.5521649
do_0m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[13])

# model performance 
ggBRT::ggPerformance(do_0m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.5580518
Correlation        0.8241516
AUC                0.9634000
Per.Expl          59.7447186
cvDeviance         0.7907379
cvCorrelation      0.6970833
cvAUC              0.8943200
cvPer.Expl        42.9598231
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Yspat_Ntag) 

             rel.inf
dist_coast 29.217842
o2_mean_0m 16.224849
lat        11.255576
bathy_mean  9.441378
temp_mean   6.087307
chl_mean    5.456816
sal_mean    4.766777
ssh_mean    3.184874
mld_mean    2.755934
vostr_mean  2.493747
bathy_sd    1.924312
vo_mean     1.866466
pred_var    1.786529
uo_mean     1.781369
uostr_mean  1.756226
#explore partial plots
gbm.plot(do_0m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Yspat_Ntag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           7 uostr_mean          1        lat   264.97
2          12 bathy_mean          4  temp_mean   247.85
3           4  temp_mean          1        lat   177.44
4           8    vo_mean          5   sal_mean   143.64
5           5   sal_mean          3   chl_mean   133.48
6           5   sal_mean          1        lat   128.49
7          10   ssh_mean          4  temp_mean    93.48
8          13   bathy_sd          3   chl_mean    92.15
9           2 o2_mean_0m          1        lat    82.50
10          7 uostr_mean          2 o2_mean_0m    79.01
11         13   bathy_sd          8    vo_mean    78.63
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Yspat_Ntag, dat_test_do,
                     n.trees = do_0m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9300919
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6783956
max(e@TPR + e@TNR -1) #TSS
[1] 0.5615477
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Yspat_Ntag)
[1] 59.74472
#eval 75/25
eval_7525_modified(do_0m_Yspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3899882 0.6393687 0.8580075  1.126315          0.329067 0.5974472
do_0m_60m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[11])

# model performance 
ggBRT::ggPerformance(do_0m_60m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.5960029
Correlation        0.8077326
AUC                0.9559000
Per.Expl          57.0071067
cvDeviance         0.8245456
cvCorrelation      0.6801831
cvAUC              0.8845200
cvPer.Expl        40.5210924
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_Nspat_Ntag) 

              rel.inf
bathy_mean  29.474044
o2_mean_0m  20.685900
o2_mean_60m  9.480854
temp_mean    7.844178
chl_mean     6.789479
sal_mean     5.422841
ssh_mean     3.350090
mld_mean     3.315024
vostr_mean   2.806358
bathy_sd     2.456202
vo_mean      2.346608
uo_mean      2.314745
uostr_mean   1.970002
pred_var     1.743674
#explore partial plots
gbm.plot(do_0m_60m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_Nspat_Ntag)
do_int$rank.list
   var1.index  var1.names var2.index var2.names int.size
1          11  bathy_mean          3  temp_mean   613.95
2          11  bathy_mean          9   ssh_mean   319.85
3           7     vo_mean          4   sal_mean   255.28
4          11  bathy_mean          1 o2_mean_0m   223.92
5           8  vostr_mean          1 o2_mean_0m   163.94
6           9    ssh_mean          2   chl_mean   141.11
7           6  uostr_mean          1 o2_mean_0m   135.14
8          13 o2_mean_60m          5    uo_mean   130.94
9           3   temp_mean          1 o2_mean_0m   114.13
10         13 o2_mean_60m         11 bathy_mean   102.47
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_60m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.953638
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6732678
max(e@TPR + e@TNR -1) #TSS
[1] 0.5547582
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_Nspat_Ntag)
[1] 57.00711
#eval 75/25
eval_7525_modified(do_0m_60m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3953646 0.6248424 0.8477561  1.118706         0.3120818 0.5700711
do_0m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[8])

# model performance 
ggBRT::ggPerformance(do_0m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.5951366
Correlation        0.8078475
AUC                0.9566000
Per.Expl          57.0695915
cvDeviance         0.8188375
cvCorrelation      0.6829591
cvAUC              0.8860800
cvPer.Expl        40.9328434
#relative influence of predictors
ggBRT::ggInfluence(do_0m_250m_Nspat_Ntag) 

               rel.inf
o2_mean_0m   25.778726
o2_mean_250m 23.499205
bathy_mean   14.674555
temp_mean     6.793853
sal_mean      5.567258
chl_mean      5.508642
ssh_mean      3.268859
mld_mean      2.659917
bathy_sd      2.467013
vostr_mean    2.183914
vo_mean       2.098303
uo_mean       1.892809
uostr_mean    1.861556
pred_var      1.745391
#explore partial plots
gbm.plot(do_0m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_250m_Nspat_Ntag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          11   bathy_mean          3  temp_mean   273.93
2           6   uostr_mean          1 o2_mean_0m   230.28
3           7      vo_mean          4   sal_mean   217.00
4          11   bathy_mean          9   ssh_mean   122.51
5          11   bathy_mean          1 o2_mean_0m   114.64
6          13 o2_mean_250m          3  temp_mean   107.88
7           9     ssh_mean          2   chl_mean   100.32
8          12     bathy_sd          2   chl_mean    92.32
9          13 o2_mean_250m          1 o2_mean_0m    84.01
10         12     bathy_sd          1 o2_mean_0m    71.35
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_250m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9539842
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6731417
max(e@TPR + e@TNR -1) #TSS
[1] 0.5557117
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_250m_Nspat_Ntag)
[1] 57.06959
#eval 75/25
eval_7525_modified(do_0m_250m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.395951 0.6237522 0.8475012  1.121258         0.3118321 0.5706959
do_0m_60m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[9])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.5870229
Correlation        0.8110790
AUC                0.9579000
Per.Expl          57.6548789
cvDeviance         0.8118559
cvCorrelation      0.6868417
cvAUC              0.8880400
cvPer.Expl        41.4364689
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Nspat_Ntag) 

               rel.inf
o2_mean_0m   25.288168
o2_mean_250m 22.833546
bathy_mean   13.619527
temp_mean     6.153364
o2_mean_60m   5.801156
chl_mean      5.075643
sal_mean      5.015392
ssh_mean      2.813548
mld_mean      2.571844
bathy_sd      2.243730
vostr_mean    1.854911
vo_mean       1.782417
uostr_mean    1.732225
uo_mean       1.678592
pred_var      1.535935
#explore partial plots
gbm.plot(do_0m_60m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Nspat_Ntag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          11   bathy_mean          3  temp_mean   205.44
2           7      vo_mean          4   sal_mean   196.39
3          13  o2_mean_60m          5    uo_mean   139.86
4          14 o2_mean_250m          3  temp_mean   124.25
5           6   uostr_mean          1 o2_mean_0m   124.23
6          13  o2_mean_60m          4   sal_mean    97.65
7          11   bathy_mean          9   ssh_mean    94.77
8          11   bathy_mean          1 o2_mean_0m    92.90
9          13  o2_mean_60m         11 bathy_mean    89.46
10         13  o2_mean_60m          2   chl_mean    88.52
11         13  o2_mean_60m          1 o2_mean_0m    78.06
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_60m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9490945
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6741771
max(e@TPR + e@TNR -1) #TSS
[1] 0.5563745
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Nspat_Ntag)
[1] 57.65488
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4600 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3946264 0.6271225 0.8495688  1.121056         0.3153593 0.5765488
do_0m_60m_250m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[10])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.5527203
Correlation        0.8262598
AUC                0.9640000
Per.Expl          60.1293084
cvDeviance         0.7847145
cvCorrelation      0.7003898
cvAUC              0.8958800
cvPer.Expl        43.3943200
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Yspat_Ntag) 

               rel.inf
dist_coast   26.854489
o2_mean_0m   17.046558
o2_mean_250m 10.577546
lat           8.073813
o2_mean_60m   5.245044
temp_mean     4.966089
chl_mean      4.654707
sal_mean      4.332791
bathy_mean    4.206105
ssh_mean      2.785887
mld_mean      2.207692
bathy_sd      1.607619
pred_var      1.567828
uostr_mean    1.493819
vostr_mean    1.471591
vo_mean       1.467744
uo_mean       1.440676
#explore partial plots
gbm.plot(do_0m_60m_250m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Yspat_Ntag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          12   bathy_mean          4  temp_mean   132.83
2           8      vo_mean          5   sal_mean   131.72
3          13     bathy_sd          8    vo_mean   106.08
4          16 o2_mean_250m          4  temp_mean   101.85
5           5     sal_mean          1        lat    93.91
6           7   uostr_mean          1        lat    93.37
7           4    temp_mean          1        lat    78.62
8          15  o2_mean_60m          3   chl_mean    78.16
9          10     ssh_mean          2 o2_mean_0m    70.00
10          5     sal_mean          3   chl_mean    65.17
11          4    temp_mean          2 o2_mean_0m    62.51
12         15  o2_mean_60m          6    uo_mean    61.34
13          5     sal_mean          4  temp_mean    59.83
14         10     ssh_mean          1        lat    59.29
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Yspat_Ntag, dat_test_do,
                     n.trees = do_0m_60m_250m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.9288215
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6784688
max(e@TPR + e@TNR -1) #TSS
[1] 0.5641039
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Yspat_Ntag)
[1] 60.12931
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Yspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3898282 0.6399039 0.8581617   1.12605         0.3299834 0.6012931

AGI models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

agi_0m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[5])

# model performance 
ggBRT::ggPerformance(agi_0m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.6340293
Correlation        0.7921513
AUC                0.9492000
Per.Expl          54.2644239
cvDeviance         0.8422348
cvCorrelation      0.6722198
cvAUC              0.8801300
cvPer.Expl        39.2455695
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Nspat_Ntag) 

             rel.inf
bathy_mean 33.820563
temp_mean  15.672469
AGI_0m     13.553923
sal_mean    7.590595
chl_mean    5.893989
ssh_mean    5.078249
vostr_mean  3.672166
mld_mean    3.125983
vo_mean     2.601442
bathy_sd    2.554441
uostr_mean  2.512758
uo_mean     2.163741
pred_var    1.759683
#explore partial plots
gbm.plot(agi_0m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Nspat_Ntag)
agi_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1         12     AGI_0m          2  temp_mean  1998.30
2          5 uostr_mean          2  temp_mean   484.90
3         10 bathy_mean          2  temp_mean   307.67
4         12     AGI_0m         10 bathy_mean   268.31
5         10 bathy_mean          8   ssh_mean   245.43
6         12     AGI_0m          8   ssh_mean   193.92
7         10 bathy_mean          3   sal_mean   169.83
8         10 bathy_mean          1   chl_mean   144.06
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9566385
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6718001
max(e@TPR + e@TNR -1) #TSS
[1] 0.555972
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Nspat_Ntag)
[1] 54.26442
#eval 75/25
eval_7525_modified(agi_0m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.397247 0.6176718 0.8448034  1.110872         0.3099025 0.5426442
agi_0m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[6])

# model performance 
ggBRT::ggPerformance(agi_0m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.5780131
Correlation        0.8154685
AUC                0.9598000
Per.Expl          58.3051410
cvDeviance         0.7941253
cvCorrelation      0.6953459
cvAUC              0.8929600
cvPer.Expl        42.7159357
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Yspat_Ntag) 

             rel.inf
dist_coast 31.192044
lat        13.232571
AGI_0m     11.143744
bathy_mean 10.268145
temp_mean   8.169900
sal_mean    5.586783
chl_mean    4.714932
ssh_mean    2.882465
vostr_mean  2.390455
mld_mean    2.379627
uostr_mean  1.674082
pred_var    1.643717
bathy_sd    1.627854
uo_mean     1.552650
vo_mean     1.541032
#explore partial plots
gbm.plot(agi_0m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Yspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           6 uostr_mean          1        lat   611.59
2          13     AGI_0m          3  temp_mean   437.86
3           3  temp_mean          1        lat   361.18
4          13     AGI_0m          1        lat   325.74
5          11 bathy_mean          3  temp_mean   166.89
6           4   sal_mean          1        lat   161.98
7           4   sal_mean          3  temp_mean   134.61
8          13     AGI_0m         11 bathy_mean   102.86
9          13     AGI_0m          9   ssh_mean    94.53
10         12   bathy_sd          4   sal_mean    64.04
11         13     AGI_0m          4   sal_mean    58.91
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Yspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9274641
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6773493
max(e@TPR + e@TNR -1) #TSS
[1] 0.5714618
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Yspat_Ntag)
[1] 58.30514
#eval 75/25
eval_7525_modified(agi_0m_Yspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3902807 0.6373155 0.8559211  1.121823         0.3309482 0.5830514
agi_0m_60m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[4])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.6197682
Correlation        0.7984914
AUC                0.9521000
Per.Expl          55.2931460
cvDeviance         0.8348906
cvCorrelation      0.6763212
cvAUC              0.8824700
cvPer.Expl        39.7753371
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_Nspat_Ntag) 

             rel.inf
bathy_mean 33.692502
temp_mean  14.934611
AGI_0m     13.455631
sal_mean    7.030234
chl_mean    5.106212
AGI_60m     4.659823
ssh_mean    4.388314
vostr_mean  3.325690
mld_mean    3.155754
vo_mean     2.331571
uostr_mean  2.294309
bathy_sd    2.116380
uo_mean     1.896765
pred_var    1.612204
#explore partial plots
gbm.plot(agi_0m_60m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_Nspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  2120.20
2           5 uostr_mean          2  temp_mean   314.07
3          12     AGI_0m         10 bathy_mean   295.79
4          10 bathy_mean          8   ssh_mean   288.22
5          12     AGI_0m          3   sal_mean   224.15
6          10 bathy_mean          2  temp_mean   204.44
7          10 bathy_mean          1   chl_mean   164.66
8          10 bathy_mean          3   sal_mean   162.36
9          13    AGI_60m          3   sal_mean   151.07
10         13    AGI_60m         10 bathy_mean   125.43
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_60m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9471661
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6733886
max(e@TPR + e@TNR -1) #TSS
[1] 0.5615535
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_Nspat_Ntag)
[1] 55.29315
#eval 75/25
eval_7525_modified(agi_0m_60m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3951172 0.6232473 0.848006  1.110951         0.3167357 0.5529315
agi_0m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[1])

# model performance 
ggBRT::ggPerformance(agi_0m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.6021139
Correlation        0.8057358
AUC                0.9558000
Per.Expl          56.5666349
cvDeviance         0.8206077
cvCorrelation      0.6822655
cvAUC              0.8856300
cvPer.Expl        40.8056357
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_250m_Nspat_Ntag) 

             rel.inf
AGI_250m   22.838377
temp_mean  17.247651
bathy_mean 16.982394
AGI_0m     11.714465
sal_mean    6.879946
chl_mean    4.686083
ssh_mean    3.667093
vostr_mean  3.183649
mld_mean    2.564665
uostr_mean  2.319691
bathy_sd    2.131897
vo_mean     2.042541
uo_mean     1.929522
pred_var    1.812026
#explore partial plots
gbm.plot(agi_0m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_250m_Nspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  1466.67
2          13   AGI_250m          2  temp_mean   218.76
3          12     AGI_0m          8   ssh_mean   152.09
4          12     AGI_0m          3   sal_mean   140.15
5           5 uostr_mean          2  temp_mean   132.63
6           3   sal_mean          2  temp_mean   127.78
7          10 bathy_mean          1   chl_mean   126.84
8          10 bathy_mean          8   ssh_mean   125.55
9          10 bathy_mean          3   sal_mean   109.81
10         12     AGI_0m         10 bathy_mean   108.81
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_250m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9461757
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6747344
max(e@TPR + e@TNR -1) #TSS
[1] 0.5604837
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_250m_Nspat_Ntag)
[1] 56.56663
#eval 75/25
eval_7525_modified(agi_0m_250m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3943126 0.626913 0.8506919  1.120037         0.3174501 0.5656663
agi_0m_60m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[2])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.5667515
Correlation        0.8224811
AUC                0.9633000
Per.Expl          59.1174968
cvDeviance         0.8149061
cvCorrelation      0.6849465
cvAUC              0.8872100
cvPer.Expl        41.2169189
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Nspat_Ntag) 

             rel.inf
AGI_250m   21.018347
bathy_mean 17.029046
temp_mean  16.367793
AGI_0m     10.504762
sal_mean    6.417153
AGI_60m     4.812848
chl_mean    4.613102
ssh_mean    3.196548
vostr_mean  3.066020
mld_mean    2.767906
uostr_mean  2.454455
uo_mean     2.076121
bathy_sd    1.970942
vo_mean     1.946589
pred_var    1.758368
#explore partial plots
gbm.plot(agi_0m_60m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Nspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  1619.82
2          12     AGI_0m          3   sal_mean   301.89
3          14   AGI_250m          2  temp_mean   265.33
4           5 uostr_mean          2  temp_mean   169.21
5          10 bathy_mean          3   sal_mean   150.41
6          10 bathy_mean          2  temp_mean   125.60
7          10 bathy_mean          8   ssh_mean   120.88
8          12     AGI_0m         10 bathy_mean   100.88
9          10 bathy_mean          1   chl_mean   100.70
10         14   AGI_250m         13    AGI_60m    90.76
11         12     AGI_0m          8   ssh_mean    85.03
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.93999
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6750973
max(e@TPR + e@TNR -1) #TSS
[1] 0.5640868
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Nspat_Ntag)
[1] 59.1175
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5500 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.393462 0.6310036 0.8514003  1.128121         0.3219123 0.591175
agi_0m_60m_250m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[3])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.5474472
Correlation        0.8299837
AUC                0.9661000
Per.Expl          60.5100091
cvDeviance         0.7845229
cvCorrelation      0.7002350
cvAUC              0.8953900
cvPer.Expl        43.4086045
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Yspat_Ntag) 

             rel.inf
dist_coast 27.495072
lat        12.015818
AGI_250m    9.983021
AGI_0m      9.789768
temp_mean   7.832861
bathy_mean  5.812078
sal_mean    5.049246
chl_mean    4.305359
AGI_60m     3.697070
mld_mean    2.409316
ssh_mean    2.323457
vostr_mean  1.860367
bathy_sd    1.627986
pred_var    1.535616
uostr_mean  1.453659
uo_mean     1.441664
vo_mean     1.367642
#explore partial plots
gbm.plot(agi_0m_60m_250m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Yspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean   528.25
2          13     AGI_0m          1        lat   389.25
3           6 uostr_mean          1        lat   308.88
4           3  temp_mean          1        lat   192.77
5           4   sal_mean          1        lat    94.77
6          13     AGI_0m          9   ssh_mean    78.08
7          13     AGI_0m          2   chl_mean    71.42
8          12   bathy_sd          4   sal_mean    66.68
9          13     AGI_0m         11 bathy_mean    65.51
10         15    AGI_60m         13     AGI_0m    62.14
11         15    AGI_60m         14 dist_coast    60.98
12         17   pred_var         15    AGI_60m    58.69
13         16   AGI_250m         15    AGI_60m    56.29
14         13     AGI_0m          5    uo_mean    52.34
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Yspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.9198415
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.6788121
max(e@TPR + e@TNR -1) #TSS
[1] 0.5754677
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Yspat_Ntag)
[1] 60.51001
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Yspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3882204 0.6438738 0.8588545   1.12942         0.3364471 0.6051001

Summary table of results

output_sum_Ntag <- read.csv(here("data/brt/mod_outputs/brt_crw_output_summary_Ntag.csv"))
kableExtra::kable(output_sum_Ntag)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE SpearmanCor C.index PredRatio PseudoR2
base_0m_Nspat_Ntag 58.620 0.634 0.723 0.755 0.948 0.3096 0.793 0.9480 0.9923 0.586
do_0m_Nspat_Ntag 55.216 0.687 0.714 0.699 0.930 0.3280 0.760 0.9300 1.0000 0.552
do_0m_Yspat_Ntag 59.744 0.631 0.720 0.727 0.942 0.3120 0.784 0.9423 1.0000 0.597
do_0m_60m_Nspat_Ntag 57.007 0.666 0.717 0.712 0.935 0.3220 0.769 0.9350 1.0000 0.570
do_0m_250m_Nspat_Ntag 57.070 0.665 0.717 0.717 0.935 0.3220 0.769 0.9350 1.0000 0.571
do_0m_60m_250m_Nspat_Ntag 57.650 0.659 0.718 0.718 0.937 0.3200 0.772 0.9370 1.0000 0.577
do_0m_60m_250m_Yspat_Ntag 60.130 0.629 0.721 0.729 0.943 0.3120 0.785 0.9420 1.0000 0.601
agi_0m_Nspat_Ntag 54.264 0.695 0.714 0.700 0.929 0.3300 0.757 0.9290 0.9970 0.543
agi_0m_Yspat_Ntag 58.310 0.638 0.720 0.729 0.942 0.3150 0.782 0.9420 0.9990 0.583
agi_0m_60m_Nspat_Ntag 55.293 0.685 0.715 0.703 0.931 0.3270 0.761 0.9310 0.9990 0.553
agi_0m_250m_Nspat_Ntag 56.600 0.663 0.718 0.714 0.937 0.3210 0.771 0.9370 0.9990 0.566
agi_0m_60m_250m_Nspat_Ntag 59.120 0.633 0.721 0.731 0.944 0.3130 0.785 0.9440 0.9990 0.591
agi_0m_60m_250m_Yspat_Ntag 60.510 0.608 0.724 0.746 0.949 0.3060 0.796 0.9490 0.9990 0.605

Base models w/o tag ID and w/ data at seasonal and annual resolutions

For these models, the environmental raster data was averaged according to season and year. Observed and pseudo absence locations were then used for environmental data extraction along these raster files and were matched to each file according to either the season or year.

explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_base_0m_seas_Nspat_Ntag.rds",
            test_data = dat_test_base_s)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862921
Residual.Deviance  0.8807219
Correlation        0.6570729
AUC                0.8771000
Per.Expl          36.4692388
cvDeviance         0.9695580
cvCorrelation      0.5944613
cvAUC              0.8396600
cvPer.Expl        30.0610607
[1] "Relative influence of predictor variables"

             rel.inf
vostr_mean 15.572003
uostr_mean 13.382375
bathy_mean 13.317586
temp_mean  11.424998
vo_mean    11.376691
ssh_mean    9.814064
sal_mean    9.165748
mld_mean    5.198750
chl_mean    4.650146
uo_mean     2.770662
bathy_sd    1.797589
pred_var    1.529388
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          2   sal_mean          1   mld_mean   170.30
2         10 bathy_mean          6 uostr_mean    84.52
3          7    vo_mean          4  temp_mean    77.40
4          8 vostr_mean          4  temp_mean    74.31
5         10 bathy_mean          2   sal_mean    60.62
6          8 vostr_mean          2   sal_mean    51.57
7          6 uostr_mean          2   sal_mean    49.62
[1] "Calculated percent deviance"
[1] 0.9041464

[1] "TPR"
[1] 0.6830441
[1] "TSS"
[1] 0.5681
[1] "Percent deviance calculated by in house functions"
[1] 36.46924
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8500 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3850375 0.6407874 0.8673864 0.9890539         0.3475683 0.3646924
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_base_0m_ann_Nspat_Ntag.rds",
            test_data = dat_test_base_a)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862692
Residual.Deviance  0.7349386
Correlation        0.7459693
AUC                0.9281000
Per.Expl          46.9844254
cvDeviance         0.9417038
cvCorrelation      0.6104769
cvAUC              0.8493500
cvPer.Expl        32.0691975
[1] "Relative influence of predictor variables"

             rel.inf
vostr_mean 21.617278
uostr_mean 14.250460
sal_mean   10.315309
bathy_mean  9.874017
mld_mean    8.510153
chl_mean    7.160177
temp_mean   6.989894
ssh_mean    5.713556
vo_mean     5.707209
uo_mean     3.859038
bathy_sd    3.093930
pred_var    2.908980
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          3   ssh_mean          2   sal_mean   662.38
2          8 vostr_mean          4  temp_mean   556.71
3          8 vostr_mean          6 uostr_mean   501.98
4          6 uostr_mean          3   ssh_mean   250.67
5          6 uostr_mean          2   sal_mean   186.36
6          4  temp_mean          3   ssh_mean   178.75
7         10 bathy_mean          8 vostr_mean   178.13
[1] "Calculated percent deviance"
[1] 0.8014552

[1] "TPR"
[1] 0.7008862
[1] "TSS"
[1] 0.6315782
[1] "Percent deviance calculated by in house functions"
[1] 46.98443
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3585463 0.7019189 0.9031209 0.9931701         0.4218617 0.4698443

DO models w/o tag ID and w/ data at seasonal and annual resolutions

explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_do_0m_60m_250m_seas_Nspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862918
Residual.Deviance  0.8131538
Correlation        0.6994460
AUC                0.9026000
Per.Expl          41.3432465
cvDeviance         0.9300675
cvCorrelation      0.6197794
cvAUC              0.8540500
cvPer.Expl        32.9096909
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_250m_seas 25.152468
o2_mean_0m_seas   25.038229
temp_mean          8.349439
bathy_mean         8.224763
o2_mean_60m_seas   7.769275
sal_mean           6.060609
ssh_mean           4.248712
chl_mean           4.174670
mld_mean           2.601719
bathy_sd           1.708980
vostr_mean         1.599245
vo_mean            1.477038
uo_mean            1.363871
pred_var           1.157471
uostr_mean         1.073510
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index      var2.names int.size
1          13   o2_mean_0m_seas          8        ssh_mean   145.76
2          15 o2_mean_250m_seas         13 o2_mean_0m_seas   108.53
3          10        bathy_mean          2       temp_mean    87.06
4          13   o2_mean_0m_seas          2       temp_mean    70.31
5          14  o2_mean_60m_seas          3        sal_mean    59.91
6          14  o2_mean_60m_seas          2       temp_mean    52.99
7           8          ssh_mean          3        sal_mean    49.60
8          13   o2_mean_0m_seas          1        chl_mean    44.89
9          11          bathy_sd          6         vo_mean    42.78
10         13   o2_mean_0m_seas         10      bathy_mean    38.04
11          6           vo_mean          3        sal_mean    34.64
[1] "Calculated percent deviance"
[1] 0.8354198

[1] "TPR"
[1] 0.6949432
[1] "TSS"
[1] 0.6043099
[1] "Percent deviance calculated by in house functions"
[1] 41.34325
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3676294 0.6829054 0.8912124 0.9932145         0.3973599 0.4134325
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_do_0m_60m_250m_seas_Yspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862918
Residual.Deviance  0.7943064
Correlation        0.7096714
AUC                0.9084000
Per.Expl          42.7028024
cvDeviance         0.9108600
cvCorrelation      0.6311181
cvAUC              0.8612700
cvPer.Expl        34.2952205
[1] "Relative influence of predictor variables"

                     rel.inf
dist_coast        25.1841934
o2_mean_0m_seas   18.1479440
o2_mean_250m_seas 12.5170316
temp_mean          7.7078633
o2_mean_60m_seas   6.1992179
sal_mean           6.0402895
lat                5.3149710
chl_mean           3.8851264
bathy_mean         3.6556887
ssh_mean           2.9063749
mld_mean           2.1701965
vostr_mean         1.3936376
vo_mean            1.0712342
pred_var           1.0113083
bathy_sd           1.0112493
uostr_mean         0.9104869
uo_mean            0.8731865
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index      var2.names int.size
1          11        bathy_mean          3       temp_mean   134.39
2          15   o2_mean_0m_seas          9        ssh_mean   125.42
3          17 o2_mean_250m_seas          4        sal_mean    73.23
4          15   o2_mean_0m_seas          3       temp_mean    52.57
5          17 o2_mean_250m_seas         15 o2_mean_0m_seas    46.03
6          15   o2_mean_0m_seas          2        chl_mean    35.95
7          15   o2_mean_0m_seas          1             lat    32.66
8           9          ssh_mean          4        sal_mean    31.50
9           4          sal_mean          1             lat    31.02
10         14          pred_var          5         uo_mean    30.33
11          3         temp_mean          1             lat    28.85
12         16  o2_mean_60m_seas          3       temp_mean    26.24
13         12          bathy_sd          7         vo_mean    26.07
14         16  o2_mean_60m_seas          4        sal_mean    25.70
[1] "Calculated percent deviance"
[1] 0.8183304

[1] "TPR"
[1] 0.6976123
[1] "TSS"
[1] 0.6129695
[1] "Percent deviance calculated by in house functions"
[1] 42.7028
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained PseudoR2
1 0.3636406 0.6915222 0.896574 0.9916489         0.4096875 0.427028
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_ann_Nspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862918
Residual.Deviance  0.7176601
Correlation        0.7584527
AUC                0.9357000
Per.Expl          48.2316706
cvDeviance         0.9471687
cvCorrelation      0.6067520
cvAUC              0.8474300
cvPer.Expl        31.6760950
[1] "Relative influence of predictor variables"

                   rel.inf
o2_mean_250m_ann 23.641945
temp_mean        12.889831
o2_mean_0m_ann   10.482357
o2_mean_60m_ann   8.299163
bathy_mean        8.044860
sal_mean          6.948934
chl_mean          6.713889
ssh_mean          4.633439
mld_mean          3.062796
vostr_mean        3.051127
bathy_sd          3.037426
uo_mean           2.433320
vo_mean           2.391635
pred_var          2.219267
uostr_mean        2.150013
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index      var1.names var2.index var2.names int.size
1          14 o2_mean_60m_ann          2  temp_mean   289.05
2          13  o2_mean_0m_ann          2  temp_mean   220.91
3          14 o2_mean_60m_ann          3   sal_mean   182.87
4           6         vo_mean          3   sal_mean   120.76
5           8        ssh_mean          3   sal_mean   114.99
6           3        sal_mean          2  temp_mean   103.12
7          13  o2_mean_0m_ann          1   chl_mean   102.61
8          10      bathy_mean          2  temp_mean    87.04
9          14 o2_mean_60m_ann         10 bathy_mean    77.37
10          7      vostr_mean          2  temp_mean    65.93
11          2       temp_mean          1   chl_mean    64.65
[1] "Calculated percent deviance"
[1] 0.7843388

[1] "TPR"
[1] 0.7037686
[1] "TSS"
[1] 0.6405072
[1] "Percent deviance calculated by in house functions"
[1] 48.23167
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3541984 0.7117905 0.9089163 0.9935829         0.4342078 0.4823167
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_ann_Yspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862918
Residual.Deviance  0.6798841
Correlation        0.7782285
AUC                0.9456000
Per.Expl          50.9566399
cvDeviance         0.9261329
cvCorrelation      0.6194279
cvAUC              0.8549100
cvPer.Expl        33.1935140
[1] "Relative influence of predictor variables"

                   rel.inf
dist_coast       22.057080
o2_mean_250m_ann 11.773782
temp_mean         9.550491
lat               8.313899
sal_mean          7.157929
chl_mean          6.201912
o2_mean_60m_ann   5.734171
o2_mean_0m_ann    5.407617
bathy_mean        4.364136
ssh_mean          3.518572
mld_mean          2.878000
vostr_mean        2.455810
pred_var          2.413791
vo_mean           2.145230
uo_mean           2.082053
bathy_sd          2.006137
uostr_mean        1.939390
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index var2.names int.size
1          16  o2_mean_60m_ann          3  temp_mean   212.58
2          16  o2_mean_60m_ann         13 dist_coast   138.39
3           7          vo_mean          4   sal_mean    95.23
4           3        temp_mean          2   chl_mean    94.12
5          13       dist_coast          4   sal_mean    93.00
6          15   o2_mean_0m_ann          3  temp_mean    84.34
7          12         bathy_sd          7    vo_mean    83.82
8          14         pred_var          5    uo_mean    76.36
9          17 o2_mean_250m_ann          9   ssh_mean    73.04
10         16  o2_mean_60m_ann         11 bathy_mean    70.61
11          3        temp_mean          1        lat    67.02
12          6       uostr_mean          1        lat    66.88
13         16  o2_mean_60m_ann          4   sal_mean    64.38
14         16  o2_mean_60m_ann          1        lat    56.92
[1] "Calculated percent deviance"
[1] 0.7561192

[1] "TPR"
[1] 0.7075701
[1] "TSS"
[1] 0.657902
[1] "Percent deviance calculated by in house functions"
[1] 50.95664
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4950 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE     Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3468104 0.72634 0.9165027 0.9934549         0.4545643 0.5095664
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862918
Residual.Deviance  0.6142689
Correlation        0.8049682
AUC                0.9558000
Per.Expl          55.6897832
cvDeviance         0.8616534
cvCorrelation      0.6590420
cvAUC              0.8773000
cvPer.Expl        37.8447355
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_250m_ann  17.508228
o2_mean_0m        16.052381
o2_mean_0m_seas    9.525396
temp_mean          6.200544
bathy_mean         5.333500
o2_mean_60m_ann    4.994221
o2_mean_250m_seas  4.891444
o2_mean_60m_seas   4.548159
sal_mean           4.407086
chl_mean           3.955977
o2_mean_0m_ann     2.818546
o2_mean_60m        2.730506
o2_mean_250m       2.565295
ssh_mean           2.552316
mld_mean           2.348539
bathy_sd           1.760717
vostr_mean         1.668218
pred_var           1.649356
vo_mean            1.592859
uo_mean            1.552592
uostr_mean         1.344119
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index      var2.names int.size
1           3         temp_mean          1      o2_mean_0m   231.00
2          13       o2_mean_60m         11      bathy_mean   100.49
3          20   o2_mean_60m_ann         11      bathy_mean    98.72
4           9          ssh_mean          4        sal_mean    95.80
5          15          pred_var          5         uo_mean    93.47
6          16   o2_mean_0m_seas          9        ssh_mean    73.81
7          20   o2_mean_60m_ann          3       temp_mean    64.13
8          16   o2_mean_0m_seas          1      o2_mean_0m    52.64
9           5           uo_mean          3       temp_mean    50.48
10         18 o2_mean_250m_seas         14    o2_mean_250m    43.25
11         13       o2_mean_60m          3       temp_mean    41.61
12         11        bathy_mean          3       temp_mean    40.83
13          7           vo_mean          4        sal_mean    40.52
14         16   o2_mean_0m_seas          2        chl_mean    40.00
15          8        vostr_mean          4        sal_mean    38.58
16         17  o2_mean_60m_seas          4        sal_mean    38.16
17         19    o2_mean_0m_ann          3       temp_mean    34.41
18          8        vostr_mean          3       temp_mean    34.18
19         18 o2_mean_250m_seas         16 o2_mean_0m_seas    33.85
20          4          sal_mean          3       temp_mean    33.18
21         14      o2_mean_250m         11      bathy_mean    32.85
22         19    o2_mean_0m_ann         16 o2_mean_0m_seas    29.31
[1] "Calculated percent deviance"
[1] 0.6793058

[1] "TPR"
[1] 0.7166256
[1] "TSS"
[1] 0.7026526
[1] "Percent deviance calculated by in house functions"
[1] 55.68978
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3251921 0.7649245 0.9346237 0.9966431         0.5099746 0.5568978
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862918
Residual.Deviance  0.6040844
Correlation        0.8098984
AUC                0.9584000
Per.Expl          56.4244434
cvDeviance         0.8463090
cvCorrelation      0.6667445
cvAUC              0.8814800
cvPer.Expl        38.9515988
[1] "Relative influence of predictor variables"

                    rel.inf
dist_coast        18.347823
o2_mean_0m        13.012223
o2_mean_250m_ann   9.919182
o2_mean_0m_seas    7.879467
temp_mean          5.776584
sal_mean           4.488389
o2_mean_60m_seas   4.088402
lat                4.082625
o2_mean_60m_ann    3.740902
chl_mean           3.235370
bathy_mean         2.900908
o2_mean_250m_seas  2.609905
o2_mean_0m_ann     2.464034
o2_mean_250m       2.360227
o2_mean_60m        2.330307
mld_mean           2.149688
ssh_mean           2.035443
vostr_mean         1.604121
vo_mean            1.522583
pred_var           1.488164
uo_mean            1.347736
bathy_sd           1.317520
uostr_mean         1.298396
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index      var2.names int.size
1           4         temp_mean          2      o2_mean_0m   250.91
2          22   o2_mean_60m_ann         14      dist_coast    94.71
3          10          ssh_mean          5        sal_mean    85.95
4          20 o2_mean_250m_seas         16    o2_mean_250m    79.14
5          17          pred_var          6         uo_mean    57.74
6          18   o2_mean_0m_seas         10        ssh_mean    57.35
7          12        bathy_mean          4       temp_mean    52.46
8          18   o2_mean_0m_seas          2      o2_mean_0m    49.37
9           4         temp_mean          1             lat    47.92
10          9        vostr_mean          1             lat    41.09
11         22   o2_mean_60m_ann         12      bathy_mean    39.78
12          5          sal_mean          1             lat    36.03
13         20 o2_mean_250m_seas          5        sal_mean    34.76
14         15       o2_mean_60m          4       temp_mean    34.27
15         21    o2_mean_0m_ann          4       temp_mean    31.85
16         14        dist_coast         11        mld_mean    30.44
17         18   o2_mean_0m_seas          3        chl_mean    30.09
18          7        uostr_mean          1             lat    29.21
19          9        vostr_mean          5        sal_mean    28.85
20         18   o2_mean_0m_seas          5        sal_mean    28.79
21         21    o2_mean_0m_ann         18 o2_mean_0m_seas    28.15
22         15       o2_mean_60m         14      dist_coast    28.14
23          2        o2_mean_0m          1             lat    27.99
24         15       o2_mean_60m         12      bathy_mean    26.08
25         22   o2_mean_60m_ann          1             lat    25.67
26          8           vo_mean          1             lat    24.65
[1] "Calculated percent deviance"
[1] 0.6693636

[1] "TPR"
[1] 0.7177252
[1] "TSS"
[1] 0.7108199
[1] "Percent deviance calculated by in house functions"
[1] 56.42444
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3226575 0.7693831 0.9368542 0.9956468         0.5171465 0.5642444

Summary table of results

output_sum <- read.csv(here("data/brt/mod_outputs/brt_crw_do_seas_ann_output_summary.csv"))
kableExtra::kable(output_sum)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE SpearmanCor C.index PredRatio PseudoR2
brt_base_0m_seas_Nspat_Ntag 36.469 0.904 0.683 0.568 0.867 0.385 0.6641 0.867 0.989 0.364
brt_base_0m_ann_Nspat_Ntag 46.984 0.801 0.701 0.632 0.903 0.359 0.7020 0.903 0.993 0.470
brt_do_0m_60m_250m_seas_Nspat_Ntag 41.343 0.835 0.695 0.604 0.891 0.368 0.6830 0.891 0.993 0.413
brt_do_0m_60m_250m_seas_Yspat_Ntag 42.703 0.818 0.697 0.612 0.897 0.363 0.6920 0.897 0.992 0.427
brt_do_0m_60m_250m_ann_Nspat_Ntag 48.232 0.784 0.704 0.641 0.909 0.354 0.7120 0.909 0.994 0.482
brt_do_0m_60m_250m_ann_Yspat_Ntag 50.957 0.756 0.708 0.658 0.917 0.347 0.7260 0.917 0.993 0.510
brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag 55.689 0.679 0.717 0.703 0.935 0.325 0.7650 0.935 0.997 0.557
brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag 56.424 0.669 0.718 0.711 0.937 0.323 0.7690 0.937 0.996 0.564

AGI models w/o tag ID and w/ data at seasonal and annual resolutions

explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_agi_0m_60m_250m_seas_Nspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.8268163
Correlation        0.6936130
AUC                0.8986000
Per.Expl          40.3578088
cvDeviance         0.9406789
cvCorrelation      0.6142010
cvAUC              0.8511600
cvPer.Expl        32.1443594
[1] "Relative influence of predictor variables"

                rel.inf
AGI_250m_seas 22.495151
temp_mean     18.833854
bathy_mean    12.566968
AGI_0m_seas   10.609661
sal_mean       7.724491
AGI_60m_seas   6.588969
chl_mean       4.284478
ssh_mean       3.966623
mld_mean       3.261915
vostr_mean     1.910101
vo_mean        1.882665
bathy_sd       1.880424
uostr_mean     1.532605
uo_mean        1.431103
pred_var       1.030994
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index var2.names int.size
1          13   AGI_0m_seas          2  temp_mean   179.11
2          14  AGI_60m_seas          2  temp_mean   153.23
3          15 AGI_250m_seas          2  temp_mean    99.73
4          10    bathy_mean          2  temp_mean    86.34
5          13   AGI_0m_seas          9   mld_mean    56.60
6          15 AGI_250m_seas          3   sal_mean    56.07
7           6       vo_mean          3   sal_mean    47.39
8           3      sal_mean          2  temp_mean    45.73
9           8      ssh_mean          7 vostr_mean    35.35
10         13   AGI_0m_seas          8   ssh_mean    33.50
11         15 AGI_250m_seas          8   ssh_mean    29.20
[1] "Calculated percent deviance"
[1] 0.8573576

[1] "TPR"
[1] 0.6919068
[1] "TSS"
[1] 0.591201
[1] "Percent deviance calculated by in house functions"
[1] 40.35781
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3732395 0.6706646 0.8851429  1.004349         0.3815426 0.4035781
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_agi_0m_60m_250m_seas_Yspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.8046600
Correlation        0.7051691
AUC                0.9051000
Per.Expl          41.9560507
cvDeviance         0.9214250
cvCorrelation      0.6256203
cvAUC              0.8579300
cvPer.Expl        33.5332361
[1] "Relative influence of predictor variables"

                 rel.inf
dist_coast    28.6804740
temp_mean     10.5713934
AGI_250m_seas  9.9822707
AGI_0m_seas    9.7909657
lat            9.4561834
sal_mean       6.9826658
bathy_mean     4.3401161
AGI_60m_seas   4.1728652
chl_mean       4.1354833
ssh_mean       2.9829684
mld_mean       2.4380740
vostr_mean     1.2223646
vo_mean        1.1839411
bathy_sd       1.1201075
uo_mean        1.0086115
uostr_mean     0.9777738
pred_var       0.9537415
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index var2.names int.size
1          11    bathy_mean          3  temp_mean   114.29
2          16  AGI_60m_seas          3  temp_mean   102.24
3          17 AGI_250m_seas          4   sal_mean    81.86
4          15   AGI_0m_seas          3  temp_mean    77.40
5           4      sal_mean          3  temp_mean    59.33
6          15   AGI_0m_seas         10   mld_mean    48.89
7           3     temp_mean          2   chl_mean    46.44
8           4      sal_mean          1        lat    41.65
9           3     temp_mean          1        lat    36.48
10         13    dist_coast          4   sal_mean    26.50
11          8    vostr_mean          3  temp_mean    26.34
12         15   AGI_0m_seas          9   ssh_mean    25.43
13         13    dist_coast          3  temp_mean    23.53
14         15   AGI_0m_seas          2   chl_mean    20.08
[1] "Calculated percent deviance"
[1] 0.8361803

[1] "TPR"
[1] 0.695381
[1] "TSS"
[1] 0.6086011
[1] "Percent deviance calculated by in house functions"
[1] 41.95605
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3681478 0.6819723 0.8921047  1.003881         0.3968189 0.4195605
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_ann_Nspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.6959050
Correlation        0.7712113
AUC                0.9420000
Per.Expl          49.8010643
cvDeviance         0.9441861
cvCorrelation      0.6097757
cvAUC              0.8486600
cvPer.Expl        31.8913704
[1] "Relative influence of predictor variables"

               rel.inf
AGI_250m_ann 22.406041
temp_mean    17.235664
bathy_mean    8.397350
sal_mean      8.298281
AGI_60m_ann   7.666685
chl_mean      6.325482
AGI_0m_ann    5.276719
ssh_mean      4.597771
mld_mean      3.773413
bathy_sd      3.160114
vostr_mean    2.932974
vo_mean       2.798101
uo_mean       2.498186
uostr_mean    2.416789
pred_var      2.216428
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index  var2.names int.size
1           3     sal_mean          2   temp_mean   154.17
2          15 AGI_250m_ann          2   temp_mean   151.55
3           2    temp_mean          1    chl_mean   135.85
4          15 AGI_250m_ann         13  AGI_0m_ann   132.98
5          15 AGI_250m_ann         14 AGI_60m_ann   104.14
6          15 AGI_250m_ann          3    sal_mean    86.20
7          14  AGI_60m_ann          8    ssh_mean    65.73
8           8     ssh_mean          1    chl_mean    58.47
9           7   vostr_mean          2   temp_mean    57.14
10         14  AGI_60m_ann         10  bathy_mean    52.37
11         10   bathy_mean          2   temp_mean    50.80
[1] "Calculated percent deviance"
[1] 0.7649684

[1] "TPR"
[1] 0.7078821
[1] "TSS"
[1] 0.6642425
[1] "Percent deviance calculated by in house functions"
[1] 49.80106
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5100 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3476602 0.7264609 0.9171669  1.001948         0.4481878 0.4980106
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_ann_Yspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.6705049
Correlation        0.7830331
AUC                0.9474000
Per.Expl          51.6332921
cvDeviance         0.9180075
cvCorrelation      0.6254329
cvAUC              0.8580900
cvPer.Expl        33.7797521
[1] "Relative influence of predictor variables"

               rel.inf
dist_coast   22.826936
temp_mean    10.590660
AGI_250m_ann 10.555821
lat           9.964422
sal_mean      7.218453
chl_mean      5.728103
AGI_60m_ann   5.505228
AGI_0m_ann    4.845873
bathy_mean    4.141557
ssh_mean      3.669272
mld_mean      3.011325
vo_mean       2.074831
pred_var      2.072059
vostr_mean    2.022348
uo_mean       1.952451
bathy_sd      1.920724
uostr_mean    1.899938
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           3    temp_mean          1        lat   141.89
2           4     sal_mean          3  temp_mean   137.65
3           3    temp_mean          2   chl_mean   136.75
4          16  AGI_60m_ann          1        lat    97.93
5           7      vo_mean          4   sal_mean    97.39
6          13   dist_coast          3  temp_mean    88.57
7          11   bathy_mean          9   ssh_mean    79.33
8           4     sal_mean          1        lat    77.73
9          16  AGI_60m_ann         11 bathy_mean    71.48
10         11   bathy_mean          3  temp_mean    69.82
11         17 AGI_250m_ann         13 dist_coast    64.01
12         13   dist_coast          4   sal_mean    59.92
13         16  AGI_60m_ann         13 dist_coast    56.36
14          9     ssh_mean          8 vostr_mean    52.48
[1] "Calculated percent deviance"
[1] 0.7410847

[1] "TPR"
[1] 0.7110043
[1] "TSS"
[1] 0.6766013
[1] "Percent deviance calculated by in house functions"
[1] 51.63329
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4950 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3412925 0.7386613 0.9234047  1.002404         0.4654164 0.5163329
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.5710221
Correlation        0.8264079
AUC                0.9654000
Per.Expl          58.8094597
cvDeviance         0.8440518
cvCorrelation      0.6685080
cvAUC              0.8819900
cvPer.Expl        39.1145274
[1] "Relative influence of predictor variables"

                rel.inf
AGI_250m_ann  13.001178
temp_mean     12.830749
AGI_0m        10.918778
bathy_mean     7.845397
AGI_0m_seas    6.952762
sal_mean       5.272075
AGI_60m_ann    4.988499
AGI_250m_seas  4.690531
AGI_60m_seas   4.217959
AGI_0m_ann     3.771886
ssh_mean       3.665159
AGI_250m       3.494551
chl_mean       3.068918
mld_mean       2.540812
vostr_mean     2.043824
vo_mean        1.961488
bathy_sd       1.907967
AGI_60m        1.892487
uostr_mean     1.739288
uo_mean        1.683661
pred_var       1.512032
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index    var2.names int.size
1          12        AGI_0m          2     temp_mean  1616.55
2          20   AGI_60m_ann         16   AGI_0m_seas   195.59
3          16   AGI_0m_seas         14      AGI_250m   149.01
4          12        AGI_0m          3      sal_mean   134.56
5          12        AGI_0m         10    bathy_mean   132.39
6          19    AGI_0m_ann         16   AGI_0m_seas   128.34
7          12        AGI_0m          8      ssh_mean   110.96
8           7    vostr_mean          2     temp_mean    95.25
9          13       AGI_60m         10    bathy_mean    67.72
10         16   AGI_0m_seas          2     temp_mean    56.94
11         20   AGI_60m_ann          8      ssh_mean    53.36
12          8      ssh_mean          7    vostr_mean    50.54
13         16   AGI_0m_seas          9      mld_mean    47.71
14         18 AGI_250m_seas          2     temp_mean    47.00
15         20   AGI_60m_ann         18 AGI_250m_seas    46.84
16         19    AGI_0m_ann         14      AGI_250m    45.94
17         18 AGI_250m_seas         14      AGI_250m    43.14
18         18 AGI_250m_seas         12        AGI_0m    40.10
19         16   AGI_0m_seas          8      ssh_mean    38.33
20         17  AGI_60m_seas         12        AGI_0m    37.36
21         16   AGI_0m_seas         13       AGI_60m    35.01
22         20   AGI_60m_ann          2     temp_mean    34.67
[1] "Calculated percent deviance"
[1] 0.6504556

[1] "TPR"
[1] 0.7204943
[1] "TSS"
[1] 0.7204452
[1] "Percent deviance calculated by in house functions"
[1] 58.80946
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3171731 0.779098 0.9424043  1.003772         0.5307919 0.5880946
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.5657433
Correlation        0.8276641
AUC                0.9660000
Per.Expl          59.1902464
cvDeviance         0.8319448
cvCorrelation      0.6737772
cvAUC              0.8851700
cvPer.Expl        39.9878669
[1] "Relative influence of predictor variables"

                rel.inf
dist_coast    19.509438
AGI_0m        10.566285
lat            7.347935
AGI_0m_seas    6.737591
AGI_250m_ann   6.703165
temp_mean      6.651113
sal_mean       4.954255
AGI_60m_ann    3.872629
AGI_0m_ann     3.357691
bathy_mean     3.343414
AGI_60m_seas   3.305459
chl_mean       3.272516
AGI_250m       3.067225
ssh_mean       2.910558
AGI_250m_seas  2.513152
mld_mean       2.118438
AGI_60m        1.595637
vo_mean        1.438415
vostr_mean     1.413769
uostr_mean     1.378996
pred_var       1.373796
uo_mean        1.327595
bathy_sd       1.240930
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index    var2.names int.size
1          13        AGI_0m          3     temp_mean   908.36
2           3     temp_mean          1           lat   455.29
3          22   AGI_60m_ann         18   AGI_0m_seas   227.43
4          18   AGI_0m_seas         16      AGI_250m   203.39
5          13        AGI_0m         11    bathy_mean   103.94
6          13        AGI_0m          9      ssh_mean   101.83
7          21    AGI_0m_ann         18   AGI_0m_seas    87.16
8          13        AGI_0m          4      sal_mean    78.28
9          20 AGI_250m_seas         16      AGI_250m    64.62
10          6    uostr_mean          1           lat    64.42
11         20 AGI_250m_seas          4      sal_mean    60.21
12          8    vostr_mean          3     temp_mean    54.88
13         15       AGI_60m         14    dist_coast    54.37
14         15       AGI_60m         11    bathy_mean    50.84
15         22   AGI_60m_ann          1           lat    48.24
16         13        AGI_0m          1           lat    36.82
17         22   AGI_60m_ann          9      ssh_mean    36.48
18          4      sal_mean          1           lat    36.04
19          4      sal_mean          3     temp_mean    34.42
20         20 AGI_250m_seas         13        AGI_0m    32.95
21         21    AGI_0m_ann          1           lat    32.78
22         10      mld_mean          8    vostr_mean    30.54
23         18   AGI_0m_seas          9      ssh_mean    30.43
24         18   AGI_0m_seas         10      mld_mean    26.08
25         22   AGI_60m_ann          2      chl_mean    25.21
26         22   AGI_60m_ann         20 AGI_250m_seas    24.46
[1] "Calculated percent deviance"
[1] 0.6396553

[1] "TPR"
[1] 0.7215425
[1] "TSS"
[1] 0.7290931
[1] "Percent deviance calculated by in house functions"
[1] 59.19025
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5500 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3142732 0.7838679 0.9445091  1.003898         0.5385828 0.5919025

Summary table of results

output_sum <- read.csv(here("data/brt/mod_outputs/brt_crw_agi_seas_ann_output_summary.csv"))
kableExtra::kable(output_sum)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE SpearmanCor C.index PredRatio PseudoR2
brt_base_0m_seas_Nspat_Ntag 36.469 0.904 0.683 0.568 0.867 0.385 0.6641 0.867 0.989 0.364
brt_base_0m_ann_Nspat_Ntag 46.984 0.801 0.701 0.632 0.903 0.359 0.7020 0.903 0.993 0.470
brt_agi_0m_60m_250m_seas_Nspat_Ntag 40.358 0.857 0.692 0.591 0.885 0.373 0.6710 0.885 1.004 0.404
brt_agi_0m_60m_250m_seas_Yspat_Ntag 41.956 0.836 0.695 0.609 0.892 0.368 0.6820 0.892 1.003 0.420
brt_agi_0m_60m_250m_ann_Nspat_Ntag 49.801 0.765 0.708 0.664 0.917 0.347 0.7260 0.917 1.000 0.498
brt_agi_0m_60m_250m_ann_Yspat_Ntag 51.633 0.741 0.711 0.677 0.923 0.341 0.7390 0.923 1.000 0.516
brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag 58.809 0.650 0.720 0.720 0.942 0.317 0.7790 0.942 1.000 0.588
brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag 59.190 0.640 0.722 0.729 0.945 0.314 0.7840 0.945 1.003 0.591